Load data directly from the Census Reporter API, but use the columns titles for names, rather than the column id numbers.
Downloading data from the Census Reporter API returns both data and metadata.
To select an area to get results for, you must provide both a geoid for the containing area, and a summary level for the subdivisions.
To get the summary level, see this spreadsheet of summary levels.. You can get the geoids by browsing and searching at the Census Reporter website
In [77]:
import requests # run '!pip install requests' in the Notebook to install
import pandas as pd
# Get the Age by sex table, and a lot of metadata, directly from the Census Reporter API
# for all tracts in California
table_id = 'B01001'
summary_level = '140' # Tracts
geoid='05000US06073' # San Diego County
data = requests.get("http://api.censusreporter.org/1.0/data/show/latest"
"?table_ids={table_id}&geo_ids={sl}|{geoid}"
.format(table_id=table_id, sl=summary_level, geoid=geoid)).json()
In [78]:
from operator import itemgetter
# It looks like the JSON dicts may be properly sorted, but I'm not sure I can rely on that.
# So, sort the column id values, then make a columns title list in the same order
columns = data['tables']['B01001']['columns']
columns_id_names = sorted(columns.keys() )
column_titles = [ columns[e]['name'] for e in columns_id_names]
# The item getter will produce a tuple from a dict, getting all of the
# dict items in the order of the keys named in the argument list. In this case,
# it turns the dict into a tuple of estimate values
row_ig = itemgetter(*columns_id_names)
tracts = data['data'].keys()
# Actually get the row data.
estimate_rows = [ row_ig(data['data'][tract]['B01001']['estimate']) for tract in tracts ]
error_rows = [ row_ig(data['data'][tract]['B01001']['error']) for tract in tracts ]
# Now, creating the dataframe is really easy.
df = pd.DataFrame(estimate_rows, columns=column_titles)
df.head()
Out[78]:
In [75]:
new_titles = []
last_heading = ''
for t in column_titles:
t = t.replace('years','')
if t.endswith(':'):
t = t.strip(':')
new_titles.append(t)
last_heading = t
else:
new_titles.append(last_heading+' '+t)
df = pd.DataFrame(estimate_rows, columns=new_titles)
df.head()
Out[75]:
In [76]:
df[:5].T
Out[76]:
In [ ]: